Machine-learning-enhanced time-of-flight mass spectrometry analysis
نویسندگان
چکیده
•A machine-learning method provides reliable atomic/molecular labels for ToF-MS•No human labeling or prior information required•The training dataset is artificially generated based on isotopic abundances•Method validated a variety of materials and two ToF-MS-based techniques Time-of-flight mass spectrometry (ToF-MS) mainstream analytical technique widely used in biology, chemistry, science. ToF-MS quantitative compositional analysis with high sensitivity across wide dynamic range mass-to-charge ratios. A critical step to infer the identity detected ions. Here, we introduce machine-learning-enhanced algorithm provide user-independent approach performing this identification using patterns from natural abundances individual atomic molecular ions, without knowledge composition. Results several are compared those obtained by field experts. Our open-source, easy-to-implement, analytic accelerates process. applications can benefit our approach, e.g., hunting biomarkers contamination solid surfaces high-throughput data. Mass widespread work out what constituents material are. Atoms molecules removed collected, subsequently, their correct identities formed ratios relative abundances. However, still mainly relies users' expertise, making its standardization challenging, hindering efficient data processing. an that leverages modern machine learning identify peak time-of-flight spectra within microseconds, outperforming users loss accuracy. cross-validated different techniques, offering community intelligent analysis. revealing constitutes solution material. An array life sciences, geology, Among arsenal, one which ion's (m/z) ratio determined via ToF measurement.1Wolff M.M. Stephens W.E. pulsed spectrometer time dispersion.Rev. Sci. Instr. 1953; 24: 616-617https://doi.org/10.1063/1.1770801Crossref Scopus (56) Google Scholar It composition sampled precision masses.2Maher S. Jjunju F. Taylor Colloquium : 100 years spectrometry: perspectives future trends.Rev. Mod. Phys. 2015; 87: 113-135https://doi.org/10.1103/RevModPhys.87.113Crossref The principles common such as matrix-assisted laser desorption/ionization (MALDI), secondary ion (SIMS), atom probe tomography (APT). Each these concept emit ions sample, versatility means underlying viz. has found use chemical reaction studies, large-molecule characterization, quantification dopants semiconductors atomic-scale distribution impurities at grain boundaries metallic alloys, instance.3Sulzer P. Petersson Agarwal B. Becker K.H. Jürschik Märk T.D. Perry D. Watts Mayhew C.A. Proton transfer unambiguous real-time detection 2,4,6 trinitrotoluene.Anal. Chem. 2012; 84: 4161-4166https://doi.org/10.1021/ac3004456Crossref PubMed (55) Scholar, 4Pedersen Herek J.L. Zewail A.H. validity "diradical" hypothesis: direct femtoscond studies transition-state structures.Science. 1994; 266: 1359-1364https://doi.org/10.1126/science.266.5189.1359. eprinthttps://science.sciencemag.org/content/266/5189/1359.full.pdfCrossref 5Tanaka K. Waki H. Ido Y. Akita Yoshida T. Matsuo Protein polymer analyses up m/z 000 ionization spectrometry.Rapid Commun. Spectrom. 1988; 2: 151-153https://doi.org/10.1002/rcm.1290020802Crossref (2578) 6Kissel J. Krueger F.R. organic component dust comet Halley measured PUMA board Vega 1.Nature. 1987; 326: 755-760https://doi.org/10.1038/326755a0Crossref (374) 7Karas M. Hillenkamp Laser desorption proteins masses exceeding 10,000 daltons.Anal. 60: 2299-2301https://doi.org/10.1021/ac00171a028Crossref (4665) 8Liebscher C.H. Stoffers A. Alam Lymperakis L. Cojocaru-Mirédin O. Gault Neugebauer Dehm G. Scheu C. Raabe Strain-induced asymmetric line segregation faceted Si boundaries.Phys. Rev. Lett. 2018; 121: 1https://doi.org/10.1103/PhysRevLett.121.015702Crossref (38) essentially plot counts function ratio—typically appears each isotope element present—and amplitude proportional amount species volume. Fast accurate interpretation rich correlations spectral great importance lead discoveries.9Aebersold R. Mann Mass-spectrometric exploration proteome structure function.Nature. 2016; 537: 347-355https://doi.org/10.1038/nature19949Crossref (857) Yet rely user's it slow prone error reproducibility. Challenges development automatic two-fold. First, ToF-MS, same typically show velocity instant departure specimen. These flight times. As result, depending experimental conditions, take various shapes not always simple recognize (Figure 1).10Boesl U. introduction basics.Mass 2017; 36 (arXiv: NIHMS150003): 86-109https://doi.org/10.1002/mas.21520Crossref (39) Second, commonly encountered spectra, i.e., only signals detected.11Tsong Pulsed-laser-stimulated emission metal semiconductor surfaces: study formation atomic, molecular, cluster ions.Phys. 1984; 30: 4946-4961https://doi.org/10.1103/PhysRevB.30.4946Crossref (97) 12Sha W. Chang Smith G.D.W.D.W. Mittemeijer E.J.J. Liu Some aspects atom-probe Fe-C Fe-N systems.Surf. 1992; 416-423https://doi.org/10.1016/0039-6028(92)91055-GCrossref (106) 13Müller Saxey evaporation behaviour GaSb.Ultramicroscopy. 2011; 111: 487-492https://doi.org/10.1016/j .ultramic. 2010. 11.019Crossref 14Gordon L.M. Tran Joester Atom apatites bone-type mineralized tissues.ACS Nano. 6: 10667-10675https://doi.org/10.1021/nn3049957Crossref (67) 15Rusitzka K.A.K. Stephenson L.T. Szczepaniak Gremer Willbold view amyloid-beta fibrils tomography.Sci. Rep. 8: 1-10https://doi.org/10.1038/s41598- 018-36110-yCrossref Combining atoms into usually leads new pattern comprising combination isotopes element. Building database all possible formula practically impossible. Machine (ML) well known powerful ability signals.16Jordan M.I. Mitchell T.M. learning: trends, perspectives, prospects.Science. 349 arXiv:1011.1669v3): 255-260https://doi.org/10.1126/science.aaa8415Crossref (1776) Recently, embraced ML large-scale data-analyzing speed ion-trap-based been dramatically accelerated,17Elias J.E. Gibbons F.D. King O.D. Roth F.P. Gygi S.P. Intensity-based protein library tandem spectra.Nat. Biotechnol. 2004; 22: 214-219https://doi.org/10.1038/nbt930Crossref (247) Scholar,18Gessulat Schmidt Zolg D.P. Samaras Schnatbaum Zerweck Knaute Rechenberger Delanghe Huhmer et al.Prosit: proteome-wide prediction peptide deep learning.Nat. Methods. 2019; 16: 509-518https://doi.org/10.1038/s41592-019-0426-7Crossref (133) whereas largely searching.19Sadygov R.G. Cociorva Yates J.R. Large-scale searching spectra: looking answer back book.Nat. 1: 195-202https://doi.org/10.1038/nmeth725Crossref (311) Scholar,20Sinitcyn Rudolph J.D. Cox Computational methods understanding spectrometry-based shotgun proteomics data.Annu. Biomed. Data 207-234https://doi.org/10.1146/annurev-biodatasci-080917-013516Crossref pioneering works demonstrated potential applying statistical/ML For example, unsupervised exploratory ToF-SIMS ToF-MALDI.21Biesinger M.C. Paepegaey P.-Y. McIntyre N.S. Harbottle R.R. Petersen N.O. Principal TOF-SIMS images monolayers.Anal. 2002; 74: 5711-5716https://doi.org/10.1021/ac020311nCrossref (80) 22McCombie Staab Stoeckli Knochenmuss Spatial MALDI clustering multivariate analysis.Anal. 2005; 77: 6118-6124https://doi.org/10.1021/ac051081qCrossref (134) 23Bluestein B.M. Morrish Graham D.J. Guenthoer Hockenbery Porter P.L. Gamble L.J. MVA compare specific regions breast tumor tissue samples ToF-SIMS.Analyst. 141: 1947-1957https://doi.org/10.1039/c5an02406dCrossref (16) 24Verbeeck N. Caprioli R.M. Van de Plas Unsupervised imaging spectrometry.Mass 2020; 39: 245-291https://doi.org/10.1002/mas.21602Crossref (40) Lately, Bayesian adopted APT.25Vurpillot Hatzoglou Radiguet Da Costa Delaroche Danoix Enhancing expectation-maximization tomography.Microsc. Microanal. 25: 367-377https://doi.org/10.1017/S1431927619000138Crossref (5) Scholar,26Mikhalychev Vlasenko Payne Reinhard Ulyanenkov mass-spectrum tomography.Ultramicroscopy. 215: 113014https://doi.org/10.1016/j.ultramic.2020.113014Crossref (2) implemented Mikhalychev al.26Mikhalychev able deconvolute many types ToF-APT simultaneously. With reasonable information, robust results. often provided users. If bad assumed, computation become very expensive. ML-based automates process assigning elemental peaks series spectra. Moreover, uncertainties attached indicating extent affected noise level shape features. We name “ML-ToF.” shown ML-ToF handle systems techniques. Indeed, cross-validate investigated include high-strength Al alloy developed aerospace applications, medium-Mn steel automotive Cu-In-based solar cell absorbers, SmCo-based permanent magnets. Furthermore, benchmark results comparing ML-ToF-assigned yielded drastically reduces duration recognition In general, takes microseconds obtain labeled spectrum, could minutes even hours. overview Figure 2. be regarded one-dimensional whose values positive. focus sufficient signal-to-background demonstrate properly discernible patterns. import Python (SciPy package, facto standard package signal processing Python) finds positions corresponding intensity values.27Virtanen Gommers Oliphant T.E. Haberland Reddy Cournapeau Burovski E. Peterson Weckesser Bright al.SciPy 1.0: fundamental algorithms scientific computing Python.Nat. 17: 261-272https://doi.org/10.1038/s41592-019- 0686-2Crossref (0) input searches local maxima comparison intensity. subset further chosen specifying conditions properties. There three major properties: height, interpeak distance, prominence. prominence defined difference between peak's height adjacent minima, indicated 3A. 3B, find definition (the absolute count value log scale). Throughout examples parameters (see 3): = 4 (log count); distance 0.25 Da; 0.5 count). By visual inspection, set capture vast majority peaks. manual procedure, need select start end position peak, 3B. This procedure referred “ranging,”28Hudson G.D.W. Optimisation ranging microanalysis application corrosion processes Zr alloys.Ultramicroscopy. 480-486https://doi.org/10.1016/j.ultramic.2010.11.007Crossref (37) errors due shapes, depend part instrument conditions. instance, pulse energy base temperature was have influence.29Yao Cairney J.M. Zhu Ringer specimen fraction microscopy experiments microalloyed steel.Ultramicroscopy. 648-651Crossref (25) 30Tang Optimization (PLAP) nanocomposite Ti-Si-N films.Ultramicroscopy. 2010; 110: 836-843https://doi.org/10.1016/j.ultramic.2010.03.003Crossref (53) 31La Fontaine Breen Ceguerra A.V. Yang Dinh Nguyen Zhang Young Interpreting chromium oxide scales.Ultramicroscopy. 159: 354-359https://doi.org/10.1016/j.ultramic.2015. 02.005Crossref confine task assume represents instead entire range. assumption practice: when they exhibit long tails. Tails originate either deficits uncertainty left specimen's surface32Müller E.W. Krishnaswamy S.V. Energy deficit compensated designs.Rev. 1974; 45: 10531059Crossref (107) 33Vurpillot Vella Bouet Deconihout Estimation cooling times tip under illumination.Appl. 2006; 88: 94105https://doi.org/10.1063/1.2181654Crossref (69) 34Vurpillot Houard Thermal response emitter subjected ultra-fast illumination.J. D Appl. 2009; 42: 125502https://doi.org/10.1088/0022-3727/42/12/125502Crossref (87) 35Gault Moody M.P.M. S.S.P. Cariney microscopy.in: Springer Series Materials Science. Vol. 160. S, 2012https://doi.org/10.1007/978-1-4614-3436-8Crossref Discussion). serve ML-ToF. general terms, existing categorized types: (1) pattern, exhibiting abundance particular element, more elements mixed distribution. section, systematic identifies both Two main addressed, strategy construct search most probable recognizer designed protocol 4. Patterns recognized respective ratio. After obtained, charge performed exact produce optimal if trained good database. case, consists parts: number peaks, ratio, (IDR). IDR neighboring divided smallest group Fe+ four 54, 56, 57, 58 Da. So (56 ? 54)/(58 57):(57 56)/(58 57):(58 57)/(58 57) 2:1:1. such, Fe form state 2 27, 28, 28.5, 29 Da, do impose any constraints elements. important, charge-to-state vary significantly (i.e., 1+, 2+, 3+ state) single dataset.36Kingham D.R. post-ionization evaporated ions: theoretical explanation multiple states.Surf. 1982; 116: 273-301https://doi.org/10.1016/0039-6028(82) 90434-4Crossref contains (excluding inert gases) some lanthanides. Currently, 37 3 compounds, S2 C2. compounds included because strong tendency frequently observed experimentally. Further regarding Supplemental 1.1. seen 4, matching first toward full recognition. given filter candidates matched IDR. Subsequently, will examine candidates. practice, contain calibration errors. Therefore, rounds ratio's digits 0, 1/4, 1/3, 2/3, 3/4, 1 so correctly calculated. next concerned Classification trivial task. Different sometimes aggregate similar difficult them. naturally suited data-driven classification tasks, thanks learn improve experience intervention16Jordan automatically. Unlike conventional yes/no answer, list answers likelihoods. Even match cannot found, ranking likely labels. other words, looks partially retained thus assigns higher probability. isotopes, calculates (rm=P1/P2) compares expected ones (rt). deviation (rm rt)/(rt) exceeded certain threshold (here chose empirically 0.3), then classified unidentified Cu 69.17:30.83, therefore rt 69.17/30.83 2.24. assign goes outside [1.56, 2.91]. monoisotopic (e.g., Al, As, Co), since there no states Al+ 27 Al2+ 13.5 Da). present study, selected Light Gradient Boosting (LightGBM) model. LightGBM belongs framework Decision Tree (GBDT).37Friedman J.H. Greedy approximation: gradient boosting machine.Ann. Stat. 2001; 29: 1189-1232https://doi.org/10.2307/2699986Crossref GBDT ensemble model weaker learners sequence. iteration, decision tree learns current iteration. Via descent every subsequent minimizes actual output weighted sum predictions previous iterations. final average weak learners. achieved state-of-the-art performance multiclass classification38Li Robust LogitBoost adaptive class (ABC) LogitBoost.arXiv. (CoRR abs/1203.3491) (1203.3491)Google tasks.39Burges C.J.C. From rankNet LambdaRank lambdaMART: overview.Learning. 11: 23-581https://doi.org/10.1111/j.1467-8535.2010.01085.xCrossref (23) label-predicting multilabel setting, tries minimize objective L:L=?1N(?i=1Nyi?log(si)).(Equation 1) L cross-entropy. ML-specific entropy formulation serves measure probability distributions models; N labels, yi ground truth, si denotes measures how off machine’s truth. smaller is, closer Zero would imply 100% cross-entropy mean square problem faster improved generalization.40Bishop C.M. Pattern Recognition Learning (Information Science Statistics). Springer-Verlag, 2006Google contrast black-box models like neural network, enjoys unique advantage; namely, explainable model, but also interpret example S1. Other explanations 1.2. generate 5,000 points During training, total split (around 4,000 points) second 1,000 validate More details construction Figures 5A–5D illustrate histories three-, four-, five-, seven-peak three-peak achieves near-zero after about 200 iterations plateaus zero. Loss four-peak, five-peak, trends. Notably, four-peak converges zero much rate, reaching Thus stops early 500 Training validating losses almost identical cases, resulting completely overlapping curves. (A–D) three- (A), four- (B), five- (C), seven- (D) shown. L, validation curves (indicated valid_1, correspondingly). same. Hence, overlap completely. Confusion matrix useful tool visualizing enables truth test dataset. confusion matrices (shown 6A–6D) indicate perfectly predict addition, introduced “redundancy” deal partial overlapped pattern. assigned Fe: 5.8:91.8:2.1:0.3; 57 5.8:91.8:2.1; (3) 91.8:2.1:0.3. signal-to-noise too detected. Or Ni presence (major Da) destroys Fe. Such redundancy scheme guarantees degree robustness against sources. indicates achieve accuracy Small randomness training/testing splitting. size quite close it. “probable label” than 90% certainty (assigned model). label yet identified label. satisfies Fe, another last step, confirmed database, case were Fe+,
منابع مشابه
High-Throughput STR Analysis by Time-of-Flight Mass Spectrometry
Rapid, cost-effective methods for high-throughput DNA analysis are needed to process samples currently being gathered for large criminal DNA databases around the world. Within the U.S., several states have sample backlogs of over 50,000 samples with limited funds and manpower to analyze these samples. Currently available slab gel or capillary electrophoresis instruments can handle only a few do...
متن کاملDesorption Ionization-Time of Flight Mass Spectrometry
Advances in the Identification of Clinical Yeast Isolates Using Matrix-Assisted Laser 1 Desorption Ionization-Time of Flight Mass Spectrometry 2 3 Blake W. Buchan and Nathan A. Ledeboer 4 Department of Pathology, Medical College of Wisconsin, and Dynacare Laboratories, 9200 W. 5 Wisconsin Ave., Milwaukee, WI 53226. 6 7 Running Title: Identification of Yeast Using MALDI-TOF 8 9 Correspondence: N...
متن کاملHadamard Transform Time-of-Flight Mass Spectrometry
A new mode of operation of a time-of-flight mass spectrometer (TOFMS) is described and demonstrated. A continuous ion beam emerging from the ion source is accelerated and then modulated by a pseudorandom sequence of “on” and “off” pulses. The data acquisition period is set to match the period of the modulation sequence, and data are acquired synchronously with the modulation of the ion beam. Th...
متن کاملTandem time-of-flight mass spectrometry.
A new tandem time-of-flight (TOF-TOF) instrument has been developed by modifying a standard matrix-assisted laser desorption ionization (MALDI)-TOF instrument to make high-performance, high-energy collision-induced dissociation (CID) MALDI tandem mass spectrometry (MS) a practical reality. To optimize fragment spectra quality, the selected precursor ion is decelerated before entering a floating...
متن کاملMultiplexed ion mobility spectrometry-orthogonal time-of-flight mass spectrometry.
Ion mobility spectrometry (IMS) coupled to orthogonal time-of-flight mass spectrometry (TOF) has shown significant promise for the characterization of complex biological mixtures. The enormous complexity of biological samples (e.g., from proteomics) and the need for both biological and technical analysis replicates imposes major challenges for multidimensional separation platforms with regard t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Patterns
سال: 2021
ISSN: ['2666-3899']
DOI: https://doi.org/10.1016/j.patter.2020.100192